Parallel Algorithms for Top-k Query Processing
نویسندگان
چکیده
The general problem of answering top-k queries can be modeled using lists of objects sorted by their local scores. Fagin et al. proposed the “middleware cost” for a top-k query algorithm, and proposed the efficient sequential Threshold Algorithm (TA). However, since the size of the dataset can be incredible huge, the middleware cost of sequential TA may be intolerable. So, in this paper, we propose parallel algorithms to process top-k queries and analyze their middleware costs. Intuitively, a naive parallel algorithm, called PTA (parallel-TA), evenly partitions the original dataset into P (the number of processors) subdatasets. Each processor finds top-k results of one corresponding subdataset using TA algorithm. Then these results are merged to get the final top-k answers. Motivated by the idea of partitioning objects, we take a further step to partition D into n subdatasets according to their degree of domination. Based on this partition, we propose EPTA (Enhanced-PTA) algorithm. Under PRAM-CRCW model, the middleware cost of PTA is 2 ( / ) O nm P while the average middleware cost of EPTA is 2 -1 ( (ln ) / ( -1)!) m O km n m under the assumption that scores in different lists are independently distributed, where n is the dataset size and m is the number of lists. Extensive experiments show that the speedup ratios of EPTA are significantly higher than those of PTA.
منابع مشابه
Parallel Probing of Web Databases for Top-k Query Processing
A “top-k query” specifies a set of preferred values for the attributes of a relation and expects as a result the k objects that are “closest” to the given preferences according to some distance function. In many web applications, the relation attributes are only available via probes to autonomous webaccessible sources. Probing these sources sequentially to process a top-k query is inefficient, ...
متن کاملAs-Soon-As-Possible Top-k Query Processing in P2P Systems
Top-k query processing techniques provide two main advantages for unstructured peer-to-peer (P2P) systems. First they avoid overwhelming users with too many results. Second they reduce significantly network resources consumption. However, existing approaches suffer from long waiting times. This is because top-k results are returned only when all queried peers have finished processing the query....
متن کاملeSPAK: Top-K Spatial Keyword Query Processing in Directed Road Networks
Given a query location and a set of query keywords, a top-k spatial keyword query rank objects based on the distance to the query location and textual relevance to the query keywords. Several solutions have been proposed for top-k spatial keyword queries in Euclidean space. However, few algorithms study top-k keyword queries in undirected road networks where every road segment is undirected. Ev...
متن کاملImplementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملA Detailed Evaluation of Threshold Algorithms for Answering Top-k queries in Peer-to-Peer Networks
Ranking queries, also known as top-k queries, have drawn considerable attention due to their usability in various applications. Several algorithms have been proposed for the evaluation of top-k queries. A large percentage of them follow the Threshold Approach. In p2p networks, top-k query processing can provide a lot of advantages both in time and bandwidth consumption. We focus on the main ada...
متن کامل